Chapter 1 Prerequisites

Loading

## [1] "C"

Loading of packages

Chapter 2 SHARE data description

The Survey of Health Ageing and Retirement (SHARE) in Europe consists of data on health and socioeconomic variables of non-institutionalized individuals aged 50 and older across 28 European countries and Israel [Boersch et al]. The data sets includes about 140,000 men and women, ages 50 or older, collected in years 2004 to 2018. Waves to be analyzed are 1 to 7. Also data from waves 3 and 7 (SHARELIFE interviews) were considered in this report.

2.1 Subset of the SHARE data analyzed in this project

In this report we analyze the data collected in Denmark, where the sampling was based on simple random sampling. We also limit the sample to individuals age above 50 at first interview.

The general characteristics of the study are described elsewhere (see citations below). Here we report the

2.2 Sampling in Denmark

The study protocol describes the age-related inclusion criteria by wave. Participants in Wave 1 had to be born in 1954 or before; the study design planned full range refreshment samples in Waves 2 (birth year <=1956) and 5 (birth year <=1962), and refreshment sample of the youngest cohort only in Wave 4 (birth years 1957-60) and 6 (birth years 1963-4). The full range refreshment sampling include an over-sampling of the youngest cohorts that were not age-eligible in the previous refreshment samples to maintain the representation of younger cohorts and their aim is to compensate for the effect of panel attrition on all age cohorts.

2.3 Non-enrollment

The characteristics of the subjects that were selected for participation but that did not enter the study are not reported in the data set. Limited information is available only in the documentation published by the study (retrievable at ), while detailed description of the selected sample is not provided.

2.4 Type of questionnaire

Different types of questionnaire were used during the study. By design, the baseline questionnaire is used for the first interview, the longitudinal questionnaire for the follow-up interviews. The questionnaire used in the SHARELIFE interviews (Wave 3 and partly Wave 7) includes only a subset of the questions from the longitudinal questionnaire, and additional questions about the history of the life of the participants is collected.

2.5 Changes in data collection process

The questionnaires used in the study contain thousands of questions divided in modules. Several changes occurred in the questionnaires used during the study. The study documentation includes the description of the changes. It is important to note that some questions that might provide different type of information based on the type of questionnaire being used and that careful examination of the metadata is required. Given the large number of questions and waves, retrieving all the relevant information can be challenging. Some information were used in the analysis strategy (AS) (for example, height was used as a time fixed variable, as for most of the waves it was recorded only in the baseline interviews).

References: Börsch-Supan A, Brandt M, Hunkler C, Kneip T, Korbmacher J, Malter F, et al. Data Resource Profile: The Survey of Health, Ageing and Retirement in Europe (SHARE). Int J Epidemiol. 2013;42: 992–1001

Website: www.share-eric.eu.

SHARE release guide 7.1.1

Chapter 3 IDA plan

3.1 Prerequisites for IDA plan

3.1.1 Analysis strategy

3.1.2 Analysis Ready Data dictionary

3.1.3 Domain expertise

The variables sex, age, interview type (baseline, longitudinal, SHARELIFE) were chosen as structural variables.

3.2 IDA planned analyses

Chapter 4 Data screening

4.1 Participation profile

4.1.1 Time frame of the study (P1)

Here we summarize the times when interviews were taken (by calendar time or Wave).

4.1.1.1 Distribution of the dates of the interviews (by wave)

The graph below shows the distribution of the dates where the interviews were carried out, stratified by Wave.

4.1.1.2 Time range for each Wave (baseline and longitudinal interviews)

The wave with most interviews was Wave 5. The distribution of the number of interviews per Wave is shown below, with the range of dates for the interviews performed in different waves.

The time lag between waves was approximately 2 years, with a slightly longer gap between Wave 1 and 2, and Wave 3 and 4. The shorter time lag was between Wave 2 and 3, and Wave 5 and 6.

Wave Number of interviews Proportion Begin (date) End (date)
Wave 1 1596 0.09 2004-04-15 2004-11-15
Wave 2 2487 0.13 2006-11-15 2007-08-15
Wave 3 1979 0.11 2008-11-15 2009-08-15
Wave 4 2112 0.11 2011-02-15 2011-08-15
Wave 5 3919 0.21 2013-02-15 2013-11-15
Wave 6 3514 0.19 2015-02-15 2015-11-15
Wave 7 3025 0.16 2017-03-15 2017-10-15

We summarized the time between interviews conducted in specific Waves also at individual level (graphically and with summary statistics). NA’s indicate individuals where at least one of the two interviews was missing. Data were summarized in years.

The mean and median times between interviews conducted in consecutive Waves were similar and equal to approximately 2 years, except between Wave 1 and 2 and between Wave 3 and 4, where the lag was somehow longer. The shortest differences were observed between Wave 2 and 3, and Wave 5 and 6. The most extreme differences was observed between Wave 1 and 2 (range 2 to 3 years). The variability was highest between wave 5 and 6.

Min. 2.17 1.25 1.75 1.67 1.25 1.33
1st Qu. 2.42 1.67 2.16 1.92 1.66 1.92
Median 2.50 1.83 2.25 2.08 1.84 2.00
Mean 2.53 1.83 2.22 2.07 1.84 2.01
3rd Qu. 2.66 2.00 2.33 2.17 2.00 2.16
Max. 3.25 2.50 2.66 2.67 2.67 2.66
NA’s 4267.00 3477.00 3864.00 3632.00 2266.00 2563.00

Standard deviations of the time differences, in months.

Table 4.1: Standard deviation of time difference between measurements, in months
SD Wave 2 - Wave 1 SD Wave 3 - Wave 2 SD Wave 4 - Wave 3 SD Wave 5 - Wave 4 SD Wave 6 - Wave 5 SD Wave 7 - Wave 6
2 2.2 1.7 1.9 3.3 2.4

4.1.1.3 Distribution of baseline (first) interviews and longitudinal interviews by wave/calendar time

The participants were followed up longitudinally, and refreshment samples (new participants) were drawn during the study, as planned. The table below shows that no new participants were included in Wave 3 and 7 (SHARELIFE interviews), and that the largest refreshment samples were included in Wave 2 and 5 (planned full range refreshment samples, while Wave 4 and 6 planned the refreshment sample of the youngest cohort only). Results are presented also graphically using calendar time as time metric, where it can be seen that in the waves where both types of questionnaires were used, data from longitudinal questionnaires were generally collected earlier than those from baseline questionnaires.

Table 4.2: Number of interviews by wave, baseline or longitudinal follow-up
Baseline Longitudinal/SHARELIFE
Wave 1 1596 0
Wave 2 1266 1221
Wave 3 0 1979
Wave 4 408 1704
Wave 5 1872 2047
Wave 6 228 3286
Wave 7 0 3025

More details about the refreshment samples and about differences between questionnaires are given in the following sections.

4.1.2 Time metric (P2)

The analysis strategy (AS) defines age as the time metric in the model. Here we describe age, while later (PE1, Other time metrics) we describe more in detail the main characteristics of two additional time metrics, waves and measurement occasions (defined as the number of waves since first available measurement +1).

4.1.2.1 Distribution of age

The inclusion criteria specified that age at first interview was at least 50. the sampling design is briefly described in the description of the data.

The distribution of the age of the participants, stratified by Wave (overall, and by baseline or longitudinal interview) is presented graphically.

The overall distribution of age across waves differed somehow, as did the distribution in the baseline and longitudinal questionnaires, due to the sampling design. The small group of participants first included in Wave 4 and 6 were, by desing, considerably younger than those included in other waves. In Wave 3 and 7, where no refreshment sample was used, it was expected that the distribution of age would be shifted and reflect the 52+ population, rather than the 50+. Overall, the distribution of age across waves and types of interviews is consistent with the expectations based on the sampling design.

The distribution of ages by wave, stratified by wave of inclusion is shown in the figure below, which presents graphically the aging of the wave cohorts.

The tables below present the summary statistics for age of the observed participants, overall and by sex.

Average age somehow increased at later waves for both sexes , a similar increase in the average age is observed also in the population (data not shown).

Table 4.3: Distribution of age at interview across waves, overall
Min. 1st Qu. Median Mean 3rd Qu. Max.
Wave 1 50 56 62 64.4 72 97
Wave 2 50 56 63 64.5 72 99
Wave 3 51 58 64 65.8 73 97
Wave 4 50 57 64 65.1 72 99
Wave 5 50 57 64 65.4 72 100
Wave 6 50 58 65 65.8 72 100
Wave 7 52 60 66 67.2 73 101
Table 4.3: Females: distribution of age at interview across waves
Min. 1st Qu. Median Mean 3rd Qu. Max.
Wave 1 50 56 63 65.3 74 97
Wave 2 50 56 63 65.1 73 99
Wave 3 51 58 64 66.3 74 97
Wave 4 50 57 64 65.6 73 99
Wave 5 50 57 64 65.5 72 100
Wave 6 50 58 65 65.9 72 98
Wave 7 52 60 66 67.5 74 101
Table 4.3: Males: distribution of age at interview across waves
Min. 1st Qu. Median Mean 3rd Qu. Max.
Wave 1 50 55 61 63.4 70.8 94
Wave 2 50 56 62 63.9 70.0 92
Wave 3 51 58 64 65.2 71.0 94
Wave 4 50 57 63 64.5 71.0 96
Wave 5 50 57 64 65.3 72.0 98
Wave 6 50 58 65 65.6 72.0 100
Wave 7 52 60 66 66.9 73.0 98

4.1.3 Participants (P3)

4.1.3.1 Number of participants

Overall, 5452 unique participants were included in the data set, the number of measurements (interviews) was 18632. Denmark participated in all waves of the study.

4.1.3.2 Number of interviews for each participant

Most participants were interviewed 3 times (28%), the number of participants interviewed 1 or 2 times was very similar (16/17 %), the number of interviews ranged from 1 to 7, only 23% of subjects were interviewed 6 or 7 times; the distribution of the number of interviews is given in the table below and shown graphically.

Table 4.4: Number of interviews per participant
Number of interviews Frequency Proportion
1 965 0.18
2 966 0.18
3 1508 0.28
4 527 0.10
5 307 0.06
6 685 0.13
7 494 0.09

4.1.4 Data collection (PE3)

Different types of questionnaire were used during the study. By design, the baseline questionnaire is used for the first interview, the longitudinal questionnaire for the follow-up interviews. The questionnaire used in the SHARELIFE interviews (Wave 3 and partly Wave 7) includes only a subset of the questions from the longitudinal questionnaire, and additional questions about the history of the life of the participants is collected. Meta-data can be used to compare the questions included in different waves/questionnaires.

4.1.4.1 Type of questionnaire

The baseline questionnaire is used for most of the first interviews, and the longitudinal for follow-up interviews, but some exceptions are observed. Here we define a new variable typeQuest that can be used to check the type of questionnaire that was used in the study.

Table 4.5: Number of interviews per type of questionnaire and Wave
Baseline questionnaire Longitudinal questionnaire Sharelife NA Sum
Wave 1 1596 0 0 0 1596
Wave 2 1313 1174 0 0 2487
Wave 3 0 0 1979 0 1979
Wave 4 416 1696 0 0 2112
Wave 5 1892 2027 0 0 3919
Wave 6 265 3247 0 2 3514
Wave 7 1 1189 1835 0 3025
Sum 5483 9333 3814 2 18632
Table 4.5: Number of interviews per type of questionniare and measurement occasion
M1 M2 M3 M4 M5 M6 M7
Baseline questionnaire 5450 17 3 4 5 4 0
Longitudinal questionnaire 2 2990 1128 1714 1575 1358 566
Sharelife 0 1203 2232 281 0 32 66
0 1 0 0 1 0 0

The baseline questionnaire was used more than once for some participants (n=33), a longitudinal questionnarie was used at first measurement for 2 participants, the questionnaire type was unknown for 2 participants. In Wave 7 baseline questionnaire was used for 1 participant (by design it should not have been used).

4.1.4.2 Changes in data collection

In the following (M4) we use the comparison of the proportion of item-missing values presented to identify variables that are not available in all waves (for confirmation of the information retrieved by meta-data, or to reveal features that might not have been identified looking at meta-data).

4.2 Missing values

4.2.1 Non-enrollment (M1)

Here the aim is to describe the non-enrolled, participants that were selected but did not participate in the study, and the reasons, if available. Detailed description of the selected sample is not provided.

The documentation published by the study reports that response rates were 63% in Wave 1/2, 80% in Wave 3, 50% in Wave 4, 60% in Wave 5, 47% in Wave 6 and 85% in Wave 7. It was reported that in Wave 1 the response rates were very similar for both sexes and across age groups.

We indirectly compare the responders to their target population in ME1, using publicly available data.

4.2.2 Drop-out (M2) and intermittent missingness (M3)

Here we describe the number and characteristics of participants who dropped out from the study during the follow-up (loss to follow-up and other possible reasons: death, withdrawal, missing by design, if applicable). We also describe participants with intermittent missingness (participants that have missing data for some of the measurements - intermittent, occasional omission - but do not drop out out of the study). The summaries are based on the participants that had at least one valid interview (unit missingness other than due to non-enrollment).

4.2.2.1 Summaries of missing interviews based on wave as time metric

The follow-up of the subjects (number of interviews by Wave and proportion), stratified by baseline Wave, is shown in the table below and graphically.

The most dramatic descrease in number or participants is observed in the second wave after inclusion. Only 40% of the participants included in Wave 1 had a valid interview in Wave 7, 50% for those included in Wave 2.

Table 4.6: Number (n) and proportion (prop) of interviews by baseline wave
Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Wave 6 Wave 7
Wave 1 (n) 1596 1185 984 875 842 738 632
Wave 1 (prop) 1.00 0.74 0.62 0.55 0.53 0.46 0.40
Wave 2 (n) 1302 995 823 843 739 656
Wave 2 (prop) 1.00 0.76 0.63 0.65 0.57 0.50
Wave 4 (n) 414 351 308 281
Wave 4 (prop) 1.00 0.85 0.74 0.68
Wave 5 (n) 1883 1472 1248
Wave 5 (prop) 1.00 0.78 0.66
Wave 6 (n) 257 208
Wave 6 (prop) 1.00 0.81

4.2.2.2 Summary of the results about reasons for missing values drop-out

Here we describe the reason for missing values at interview level, summarizing the data by measurement occasion.

Below we show the distribution of the number of available interviews per measurement occasion, categorizing the potential measurements for each participant in each Wave in 7 categories;

  • Interview: the measurement was available.
  • Administrative censoring/No opportunity to measure: the measurement is not taken because the study ended (for example, participants included in wave 6 have only two possible measurement occasions)
  • Death: death was reported in the exit questionnaire (in the graph the dead participants are indicated as dead also in measurement occasions that go beyond the administrative censoring).
  • Out of household: not part of the household at the time of interview.
  • Out of sample: excluded from the study because of prolonged missingness (participants with non-response in many successive waves are labeled as out of sample); here we define a participant out of sample at the first missing interview of the sequence that determines the exclusion from the study - the definitions is applied retrospectively.
  • Definitive missingness/Missing: unit was missing in the measurement occasion, had no valid interview in later waves, but was not classified as out of sample in the study.
  • Intermittent missing: participant was not interviewed in the measurement occasion but an interview at a later wave was obtained.

The vast majority of the subjects were potentially included in the study for at least three measurement occasions. For more than 40% of the subjects the study ended at the forth measurement occasion (many subjects were included in Wave 4 or 5 and therefore cannot have more than 3 valid measurement).

Some participants had intermittent missingness (less than 5% at each measurement occasion), missingness by design because participants were not eligible was very rare (out of household, <1%), while administrative censoring and deaths were common, as was the loss to follow-up due to other reasons.

Table 4.7: Number (n) and proportion (prop, by measurement occasion) of interviews by type of missingness.
Interview Out of household Intermittent missing Missing Out of sample Death Administrative censoring
M1
5452 0 0 0 0 0 0
(prop) 1.00 0.00 0.00 0.00 0.00 0.00 0.00
M2
4211 6 274 592 207 162 0
(prop) 0.77 0.00 0.05 0.11 0.04 0.03 0.00
M3
3363 14 289 788 335 409 254
(prop) 0.62 0.00 0.05 0.14 0.06 0.08 0.05
M4
1999 11 146 364 380 561 1991
(prop) 0.37 0.00 0.03 0.07 0.07 0.10 0.37
M5
1581 6 73 297 379 726 2390
(prop) 0.29 0.00 0.01 0.05 0.07 0.13 0.44
M6
1394 9 34 352 379 894 2390
(prop) 0.26 0.00 0.01 0.06 0.07 0.16 0.44
M7
632 3 0 209 380 977 3251
(prop) 0.12 0.00 0.00 0.04 0.07 0.18 0.60
4.2.2.2.1 Number of missing interviews excluding deaths

The following tables explore the type of missing interviews, taking into account the number of reported deaths during follow-up, and evaluate the proportion of interviews carried out excluding the subjects that died during follow-up. We evaluate the number of deaths by wave of inclusion and the proportion of participants that survived through waves.

As expected, very few subjects died at the beginning of the follow-up, most of the deaths involve individuals first included in the first two waves (in Wave 7 only 37% of the individuals included in Wave 1 were reported to be still alive, and 68% among those included in Wave 2, while almost all the individuals included in Wave 4 or later were still alive in Wave 7)

Table 4.8: Number (n) and proportion (prop) of individuals with reported death in each Wave (by baseline Wave) - each death appears only once, Sum gives the total number/proportion by baseline wave.
Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Wave 6 Wave 7 Sum
Wave 1 (n) 0 66 97 89 112 92 84 540
Wave 1 (prop) 0.00 0.06 0.10 0.10 0.13 0.12 0.13 0.34
Wave 2 (n) 0 36 57 58 53 76 280
Wave 2 (prop) 0.00 0.04 0.07 0.07 0.07 0.12 0.22
Wave 4 (n) 0 3 3 5 11
Wave 4 (prop) 0.00 0.01 0.01 0.02 0.03
Wave 5 (n) 0 54 90 144
Wave 5 (prop) 0.00 0.04 0.07 0.08
Wave 6 (n) 0 3 3
Wave 6 (prop) 0.00 0.01 0.01

About 20% of the participants have missing value at the second measurement occasion; in later measurement occasions the number of missing data does not increase as dramatically. It is interesting to note that, taking the reported deaths into account, the number of missing values decreases in later interviews (Wave 1, measurement occasions 6 and 7).

In this section we evaluate in more detail the association between missingness and measured characteristics of the participants. The characteristics are compared using descriptive statistics, and baseline characteristics are compared among groups of participants (with complete response, lost to follow-up, with intermittent missingness, that die during the study).

4.2.2.3 Descriptive statistics comparing the baseline characteristics by type of missingness

Participants were categorized in those with complete information, intermittent missingness (at least one missing interview followed by at least one valid interview), lost to follow up (only missing interviews from a certain point on or defined as out of sample), not part of the household, and with reported death during study and compared by their baseline characteristics. See the definitions in Section 2, types of missing values.

Baseline characteristics by type of missingness.
N
Complete
N=2681
Death
N=978
Intermittent missing
N=476
Lost to follow up
N=1296
Out of household
N=21
gender : Female 5452 0.54 1440/2681 0.51 494/ 978 0.50 240/ 476 0.53 687/1296 0.38 8/ 21
age_int 5452 52.00 58.00 66.00
60.28 ±  8.79
66.00 75.00 81.00
73.29 ± 10.44
52.00 58.00 64.00
59.55 ±  8.18
53.00 58.00 66.00
60.20 ±  8.57
51.00 54.00 59.00
55.95 ±  6.41
age_int_cat : 50-59 5452 0.54 1452/2681 0.12 120/ 978 0.59 282/ 476 0.54 705/1296 0.81 17/ 21
  60-69 0.29 780/2681 0.21 202/ 978 0.27 127/ 476 0.30 390/1296 0.14 3/ 21
  70-80 0.14 384/2681 0.41 399/ 978 0.13 62/ 476 0.13 166/1296 0.05 1/ 21
  80+ 0.02 65/2681 0.26 257/ 978 0.01 5/ 476 0.03 35/1296 0.00 0/ 21
weight 5361 66.0 76.0 86.0
77.2 ± 15.2
62.5 71.0 81.0
72.7 ± 15.0
65.0 76.0 85.0
77.1 ± 15.6
66.0 75.0 86.0
76.9 ± 15.0
68.0 78.0 90.0
78.6 ± 14.5
height_imp 5418 165.00 172.00 178.00
171.82 ±  9.04
163.00 169.00 175.00
169.34 ±  8.80
165.00 172.00 178.00
171.66 ±  8.98
165.00 172.00 178.00
172.01 ±  9.40
165.00 173.00 185.00
174.67 ±  10.26
education_imp : Low 5428 0.17 447/2678 0.38 371/ 969 0.19 90/ 472 0.22 282/1288 0.05 1/ 21
  Medium 0.38 1019/2678 0.39 375/ 969 0.41 195/ 472 0.41 531/1288 0.48 10/ 21
  High 0.45 1212/2678 0.23 223/ 969 0.40 187/ 472 0.37 475/1288 0.48 10/ 21
pa_vig_freq 5423 0.67 1798/2677 0.35 339/ 965 0.66 311/ 473 0.63 810/1287 0.76 16/ 21
pa_low_freq 5422 0.94 2512/2677 0.73 707/ 964 0.95 447/ 473 0.93 1200/1287 0.95 20/ 21
cusmoke_imp : Yes 5423 0.22 590/2679 0.34 327/ 963 0.27 126/ 472 0.27 343/1288 0.43 9/ 21
maxgrip 5272 29.0 36.0 48.0
38.5 ± 12.5
21.5 29.0 38.0
30.3 ± 11.9
28.0 37.5 49.0
38.5 ± 13.1
28.0 36.0 48.0
38.3 ± 12.9
35.0 50.0 54.0
43.9 ± 13.1
a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD.   N is the number of non-missing values.

Deaths were more commonly observed among men, participants that were older and had lower education and that reported less physical activity and more smoking, and considerably lower levels of grip strength . Respondents and non responders for reasons other than death were similar in their baseline charactheristics, other than for education (higher among complete responders); complete responders smoked less frequently than non-responders. Participants with intermittent missingness were slightly younger than others. The characteristics of the small group of participants out of sample (household) indicated that these was a younger group.

Similar results were obtained also when the analysis was conducted within each wave or when the groups were compared using their missing status at second measurement occasion (data not shown).

4.2.2.4 Deaths: additional details

4.2.2.4.1 Quality of reporting of deaths

Overall, the quality of reporting deaths in data from Denmark was very good. As shown below, the vital status was unknown for few participants and the deaths were reported timely.

The table below shows the number of participants stratified by dead/alive status last available information (Wave 7), as reported in the coverscreen data.

Table 4.9: Number (n) and percentage (%) of participants classified by dead/alive status at last available wave, per country
Unknown
Alive
Dead
n % n % n %
53 1 4421 81.1 978 17.9

In Denmark the percentage of participants with unknow vital status at the end of the study was only 1% (data from Denmark are linked with the population registry).

Quality checks on the reported deaths

Here we assess the reporting of death in the dataset.

Overall, 978 participants were reported as dead by the 7th Wave, the date of death was reported for 1085 participants. However, the two groups were not completely overalapping. Some participants were categorized as dead but their date of death was missing (n=21). The group with reported date of death but reported as alive in Wave 7 (n=128) consistently had date of deaths in 2017 or later, indicating that they were still alive when Wave 7 was conducted.

The distribution of the year of death and the Wave where the death was reported in the coverscreen (as retrieved from the cover screen data and described in data cleaning section) is given below.

Most of the attributions are consistent (year of death and first Wave with reported death).

Table 4.10: Frequency of reported deaths by year of death and attributed first Wave where death occurred. The participants for which the date of death is reported but the
Wave 2 Wave 3 Wave 4 Wave 5 Wave 6 Wave 7 NA
2004 8 0 0 0 0 0 0
2005 34 0 0 0 0 0 0
2006 19 18 0 0 0 0 0
2007 5 44 0 0 0 0 0
2008 0 57 4 0 0 0 0
2009 0 8 64 0 0 0 0
2010 0 0 60 0 0 0 0
2011 0 0 13 46 0 0 0
2012 0 0 0 79 0 0 0
2013 0 0 0 42 18 0 0
2014 0 0 0 0 113 0 0
2015 0 0 0 0 68 31 0
2016 0 0 0 0 0 102 0
2017 0 0 0 0 0 80 29
2018 0 0 0 0 0 44 58
2019 0 0 0 0 0 0 41
0 6 5 6 3 1 4346

4.2.2.5 Out of sample: additional details

We explored the characteristics of the participants that were categorized as Out of sample.

Overall, 382 participants were categorized as out of sample at some point during the study, 366 of which in Wave 7.

Here we display the the combinations with at least two participants (covering all but 9 out of sample participants).

   M1    M2    M3    M4    M5    M6    M7 freq
2   1   -10   -10   -10   -10 -1000    NA  102
3   1   -10   -10   -10   -10   -10 -1000   95
4   1     1   -10   -10   -10   -10 -1000   68
5   1     1   -10   -10   -10 -1000    NA   57
6   1     1     1   -10   -10   -10 -1000   33
7   1   -10     1   -10   -10   -10 -1000   10
8   1 -1000 -1000 -1000 -1000 -1000 -1000    2
9   1 -1000 -1000    NA    NA    NA    NA    2
10  1   -10 -1001 -1000 -1000 -1000 -1000    2
11  1     1 -1000 -1000    NA    NA    NA    2

We can observe that the vast majority of participants with interview Out of sample (code -1000) are excluded from the study after 3, 4 or 5 missing interviews (code -10). This might indicate that in Denmark some rules that would exclude participants with long non-response are used to exclude participants from the study and that it is appropriate to interpret them as participants lost to follow-up.

The detailed exploration of metadata confirmed this finding (from Wave 7, the participants that did not participate for 3 consecutive interviews, or for which the end-of-life interview was not completed in two waves, were categorized as out of sample).

4.2.2.6 Out of household: additional details

Overall, only 34 participants were categorized as out of household at some point during the study, the numbers were rather uniform across waves.

   M1    M2    M3    M4    M5    M6    M7 freq
2   1   -10 -1001    NA    NA    NA    NA    3
3   1   -10 -1001 -1000 -1000 -1000 -1000    2
4   1     1 -1001    NA    NA    NA    NA    2
5   1     1     1     1     1 -1001    NA    2
6   1 -1001 -1001 -1001 -1001 -1000    NA    1
7   1 -1001 -1000 -1000 -1000  -100    NA    1
8   1 -1001  -100  -100  -100  -100  -100    1
9   1 -1001   -10 -1000  -100  -100  -100    1
10  1 -1001     1     1     1     1     1    1
11  1 -1001     1    NA    NA    NA    NA    1
12  1   -10 -1001 -1001 -1001 -1001    NA    1
13  1   -10 -1001 -1001   -10  -100    NA    1
14  1   -10   -10 -1001     1     1  -100    1
15  1   -10   -10 -1001     1     1     1    1
16  1   -10   -10   -10 -1001 -1001    NA    1
17  1   -10   -10     1     1     1 -1001    1
18  1     1 -1001 -1001 -1001     1     1    1
19  1     1 -1001 -1001     1     1     1    1
20  1     1 -1001 -1001    NA    NA    NA    1
21  1     1 -1001     1     1     1     1    1
22  1     1   -10 -1001 -1001 -1001    NA    1
23  1     1   -10 -1001   -10   -10   -10    1
24  1     1   -10   -10   -10 -1001     1    1
25  1     1   -10     1     1 -1001    NA    1
26  1     1     1 -1001   -10   -10  -100    1
27  1     1     1     1 -1001     1    NA    1
28  1     1     1     1     1 -1001 -1001    1
29  1     1     1     1     1 -1001  -100    1
30  1     1     1     1     1     1 -1001    1

We can observe that most of the combinations appear only once. In some cases participants are categorized as out of sample after having been out of the household. In few cases the participants that were out of the household re-enter the study (are interviewed again or appear as having missing interviews). Given the very small number of participants in this group, their further study does not seem of interest. In the statistical analyses these observations should be treated as missing by design.

4.2.3 Variable missingness (item missingness, M4)

Here we describe the missing values for the variables included in the analysis strategy (AS) as appearing in the model addressing the primary research question. The analysis is restricted to the statistical units for which the interviews were available (unit missingness is not addressed in this section). We explore also the amount of missing outcomes among the interviews that were conducted.

4.2.3.1 Item missingness at baseline interview, overall and by sex

Number and percentage of missing values at baseline interview.

All
Females
Males
Variable Missing (count) Missing (%) Missing (count) Missing (%) Missing (count) Missing (%)
maxgrip 180 3.30 109 3.80 71 2.75
weight 91 1.67 73 2.54 18 0.70
height_imp 34 0.62 21 0.73 13 0.50
pa_low_freq 30 0.55 13 0.45 17 0.66
pa_vig_freq 29 0.53 12 0.42 17 0.66
cusmoke_imp 29 0.53 13 0.45 16 0.62
education_imp 24 0.44 10 0.35 14 0.54
age_int 0 0.00 0 0.00 0 0.00
gender 0 0.00

Overall, the number of missing items at baseline is very small, the maxgrip outcome variable was the variable with most missing values (2.5%). Age and sex were not missing for any of the participants at the baseline interview. Also in longitudinal interviews age and sex were not missing for any of the participants. For this reason these variables were omitted from further summaries of missing values.

Also when stratified by sex the percentages of item missing values were low, weight was missing more frequently for women.

4.2.3.2 Item missingness at baseline interview, by age group

All
50-59
60-69
70-80
80+
Variable Missing (count) Missing (%) Missing (count) Missing (%) Missing (count) Missing (%) Missing (count) Missing (%) Missing (count) Missing (%)
maxgrip 180 3.30 57 2.21 35 2.33 43 4.25 45 12.43
weight 91 1.67 30 1.16 21 1.40 22 2.17 18 4.97
height_imp 34 0.62 6 0.23 4 0.27 11 1.09 13 3.59
pa_low_freq 30 0.55 10 0.39 5 0.33 7 0.69 8 2.21
pa_vig_freq 29 0.53 11 0.43 5 0.33 5 0.49 8 2.21
cusmoke_imp 29 0.53 10 0.39 5 0.33 6 0.59 8 2.21
education_imp 24 0.44 13 0.50 4 0.27 3 0.30 4 1.10

When stratified by age groups, the percentages of item missing values somehow increased with age, most notably for the outcome variable.

4.2.3.3 Item missingness at baseline interview, by Wave

The summary of missingness by wave are useful for visualizing possible heterogeneity across waves.

Baseline interviews taken in different waves do not differ substantially in terms of missing values. Note that Wave 4 and 6 had a small number of baseline interviews, therefore deviations of percentages of missing values from the other waves should not be overinterpreted.

4.2.3.4 Item missingness at baseline and longitudinal interviews, by Wave

The variable that had the most problematic behaviour as of missing values in the longitudinal interviews was current smoking. The current smoking information was not recorded in longitudinal interviews in wave 6 and 7, nor in SHARELIFE wave 3 interviews, while it was recorded in baseline iterviews in all waves. The generated variable cusmoke provided by SHARE, which should report the current smoking, was not defined in wave 6 and 7, even when the data were available (with baseline interviews).

      
       Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Wave 6 Wave 7
  No     1091   1782      0   1590   3138    210     41
  Yes     499    649      0    488    775     56    165
  <NA>      6     56   1979     34      6   3248   2819

The analyst might decide to consider smoking status at baseline rather than current smoking in the statistical analysis.

The following summaries will use the variable smoking at baseline (time-fixed).

In the following summaries we consider only variables that are time-varying in the AS (education and height and smoking at baseline are excluded).

Table 4.11: Number (n) and percentage (%) of missing values per variable, by Wave
Wave 1
Wave 2
Wave 3
Wave 4
Wave 5
Wave 6
Wave 7
Variable n % n % n % n % n % n % n %
weight 26 1.63 46 1.85 1979 100.00 36 1.70 60 1.53 59 1.68 70 2.31
pa_vig_freq 5 0.31 57 2.29 1979 100.00 35 1.66 6 0.15 5 0.14 1836 60.69
pa_low_freq 4 0.25 57 2.29 1979 100.00 35 1.66 8 0.20 5 0.14 1835 60.66
maxgrip 66 4.14 91 3.66 70 3.54 78 3.69 142 3.62 107 3.04 149 4.93

Between waves there are big differences in terms of missing values, especially for SHARELIFE interviews (Wave 3 and partly Wave 7, where a part of the inverviews are SHARELIFE interviews), where some variables are missing by design. The outcome variable had a proportion of missing values that was roughly comparable among waves, missing more often in the last Wave.

In our study the missingness by design in the SHARELIFE interviews of the two variables about physical activity is the most problematic aspect; weight is missing by design in wave 3 but not in wave 7 SHARELIFE interviews. This characteristic indicates that complete case analysis would not be feasible if weight and physical activity are used as explanatory variables in the models.

To further explore the effect of waves and/or baseline vs longitudinal interviews, we repeated the analyses stratifying the results by type of interview (baseline, longitudinal or SHARELIFE interviews). This graph makes the missingness by design easier to understand.

Some variables are missing by design in longitudinal or SHARELIFE interviews, for example current smoking in Wave 6 and 7, or height in Wave 4 and 7. If both variables are used as time fixed variables, measured at baseline, this does not constitute a problem in our study. Other variables missing by design: physical activity variables in SHARELIFE interviews, weight in SHARELIFE interviews of Wave 3.

4.2.3.5 Item missingness by measurement occasion

4.2.3.6 Item missingness by measurement occasion - removing the missing by design missingness

Here we evaluate the percentages of missing values, taking into account in which interviews the values are missing by design (and excluding them).

The proportion of participants with missing values of some time-varying variables is very small if the measurements where the variables are missing by design are not considered. Only for the outcome variable we observed that the proportion of participants with missing values in the outcome (and valid interview) increased at later measurement occasions.

Table 4.12: Number (n) and percentage (%) of missing values per variable, by measurement occasion
M1 M2 M3 M4 M5 M6 M7
weight
% NA 1.67 1.49 2.19 2.15 1.77 1.79 1.58
NA 91 48 52 43 28 25 10
n 5452 3216 2379 1999 1581 1394 632
pa_vig_freq
% NA 0.53 1.40 1.33 1.16 0.13 0.22 0.00
NA 29 42 15 20 2 3 0
n 5452 3008 1131 1718 1581 1362 566
pa_low_freq
% NA 0.55 1.40 1.33 1.16 0.13 0.15 0.00
NA 30 42 15 20 2 2 0
n 5452 3008 1131 1718 1581 1362 566
maxgrip
% NA 3.30 2.90 3.93 4.20 4.24 5.31 6.96
NA 180 122 132 84 67 74 44
n 5452 4211 3363 1999 1581 1394 632

By sex

Table 4.13: Number (n) and percentage (%) of missing values per variable, by measurement occasion by sex
M1 M2 M3 M4 M5 M6 M7
weight Males
% NA 0.70 0.65 0.63 0.96 0.83 1.24 1.02
NA/n 18/2583 10/1528 7/1107 9/940 6/720 8/646 3/294
weight Females
% NA 2.54 2.25 3.54 3.21 2.56 2.27 2.07
NA/n 73/2869 38/1688 45/1272 34/1059 22/861 17/748 7/338
pa_vig_freq Males
% NA 0.66 1.20 0.57 1.01 0.28 0.48 0.00
NA/n 17/2583 17/1419 3/526 8/793 2/720 3/630 0/257
pa_vig_freq Females
% NA 0.42 1.57 1.98 1.30 0.00 0.00 0.00
NA/n 12/2869 25/1589 12/605 12/925 0/861 0/732 0/309
pa_low_freq Males
% NA 0.66 1.20 0.57 1.01 0.28 0.32 0.00
NA/n 17/2583 17/1419 3/526 8/793 2/720 2/630 0/257
pa_low_freq Females
% NA 0.45 1.57 1.98 1.30 0.00 0.00 0.00
NA/n 13/2869 25/1589 12/605 12/925 0/861 0/732 0/309
maxgrip Males
% NA 2.75 2.02 2.50 2.87 3.19 4.49 5.10
NA/n 71/2583 40/1983 39/1562 27/940 23/720 29/646 15/294
maxgrip Females
% NA 3.80 3.68 5.16 5.38 5.11 6.02 8.58
NA/n 109/2869 82/2228 93/1801 57/1059 44/861 45/748 29/338

4.2.3.7 Item missingness of outcome: additional details

4.2.3.8 Item missingness of outcome: additional details

We restricted the attention to item missingness of maxgrip across measurement occasions (maxgrip missing, interview performed).

Outcome missingenss was between 2.2 and 6.5% across measurement occasions. `

Note that the number of interviews across measurement occasions is not comparable, as less observations are available for later measurement occasions.

Table 4.14: Number and percentage of missing values by measurement occasion
M1 M2 M3 M4 M5 M6 M7
Number or participants 5452.0 4211.0 3363.0 1999.0 1581.0 1394.0 632
Missing 180.0 122.0 132.0 84.0 67.0 74.0 44
% Missing 3.3 2.9 3.9 4.2 4.2 5.3 7

The table below gives the distribution of number of missing values in outcome by number of measurements (interviews available)

Table 4.15: Number and percentage of missing values by measurement occasion
Interviews
Missing
Total
0 1 2 3 4 5 n
M = 1 877 88 0 0 0 0 965
M = 2 883 64 19 0 0 0 966
M = 3 1387 90 23 8 0 0 1508
M = 4 460 45 16 5 1 0 527
M = 5 248 38 15 3 2 1 307
M = 6 615 51 12 3 1 3 685
M = 7 448 35 5 5 1 0 494
4.2.3.8.1 Outcome missingness stratified by age and sex

Here we explore the association between age (time metric from AS) and outcome missingness. The probability of missing outcome considerably increased with age, especially for women. This is shown by using descriptive statistics of the proportion of missing outcomes by sex and age group, in the complete data set (using all observations).

Table 4.16: Number (n) and percentage (%) of missing values per variable, by age group and sex
50-59 60-69 70-79 80+
Males
1.5 1.9 3.1 11.4
45/2890 57/2989 63/1994 79/611
Females
2.4 2.7 6.2 13.8
77/3159 89/3226 140/2104 153/956

Similar results are obtained using data from first interview only.

Table 4.17: Number (n) and percentage (%) of missing values per variable, by age group and sex
50-59 60-69 70-79 80+
Males
1.9 2.3 3.2 10.9
23/1207 17/717 15/457 16/131
Females
2.5 2.3 5.2 13.5
34/1312 18/750 28/512 29/186

Also a graphical display is provided that use smoothers (method gam in the geom_smooth function) to estimate the probability of missing values by age (on baseline interview and for each separate wave). As the smoothers can produce unstable estimates, these graphs should not be over-interpreted.

4.2.3.8.2 Description of the participants with outcome missing at all (avaialble) interviews

Overall, 117 participants had all missing values in the outcome at all measurement occasions (at valid interviews), most of them (n = 88, 75 %) were measured only once.

Participants with all missing outcomes were older, were less physically active, were more commonly females, had lower education than those with some non missing outcome.

Baseline characteristics by all missing outcome (0: all NA, 1: not all NA outcomes).
N
0
N=5335
1
N=117
gender : Female 5452 0.52 2795/5335 0.63 74/ 117
age_int 5452 53.0 60.0 69.0
62.3 ± 10.1
59.0 74.0 84.0
72.6 ± 14.1
age_int_cat : 50-59 5452 0.48 2546/5335 0.26 30/ 117
  60-69 0.28 1482/5335 0.17 20/ 117
  70-80 0.18 981/5335 0.26 31/ 117
  80+ 0.06 326/5335 0.31 36/ 117
weight 5361 65.0 75.0 85.0
76.5 ± 15.2
60.0 70.0 82.0
71.7 ± 15.6
height_imp 5418 165.00 171.00 178.00
171.47 ±  9.12
164.00 168.00 172.25
168.84 ±  9.60
education_imp : Low 5428 0.21 1141/5320 0.46 50/ 108
  Medium 0.39 2092/5320 0.35 38/ 108
  High 0.39 2087/5320 0.19 20/ 108
pa_vig_freq 5423 0.61 3253/5320 0.20 21/ 103
pa_low_freq 5422 0.91 4842/5319 0.43 44/ 103
cusmoke_imp : Yes 5423 0.26 1370/5319 0.24 25/ 104
a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD.   N is the number of non-missing values.

Stratification by sex of the previous table, the results are similar to the complete analysis.

Baseline characteristics by all missing outcome by sex (0: all NA, 1: not all NA outcomes).
N
0
N=2795
1
N=74
Male
age_int 2583 53.00 60.00 69.00
62.11 ±  9.79
57.50 69.00 81.00
70.12 ± 13.76
age_int_cat : 50-59 2583 0.48 1217/2540 0.30 13/ 43
  60-69 0.29 725/2540 0.21 9/ 43
  70-80 0.18 464/2540 0.19 8/ 43
  80+ 0.05 134/2540 0.30 13/ 43
weight 2565 75.0 82.0 92.0
84.0 ± 13.6
72.8 81.5 90.5
82.7 ± 14.8
height_imp 2570 173.00 178.00 183.00
178.12 ±  7.03
167.75 174.50 184.25
176.22 ±  9.29
education_imp : Low 2569 0.15 385/2531 0.32 12/ 38
  Medium 0.47 1187/2531 0.45 17/ 38
  High 0.38 959/2531 0.24 9/ 38
pa_vig_freq 2566 0.64 1625/2531 0.20 7/ 35
pa_low_freq 2566 0.91 2315/2531 0.46 16/ 35
cusmoke_imp : Yes 2567 0.27 686/2531 0.28 10/ 36
Female
age_int 2869 53.0 60.0 70.0
62.5 ± 10.4
62.2 77.0 86.0
74.1 ± 14.2
age_int_cat : 50-59 2869 0.48 1329/2795 0.23 17/ 74
  60-69 0.27 757/2795 0.15 11/ 74
  70-80 0.18 517/2795 0.31 23/ 74
  80+ 0.07 192/2795 0.31 23/ 74
weight 2796 60.0 68.0 77.0
69.5 ± 13.2
56.0 63.0 72.0
65.1 ± 12.0
height_imp 2848 161.00 165.00 170.00
165.42 ±  6.09
162.00 165.00 170.00
164.69 ±  6.94
education_imp : Low 2859 0.27 756/2789 0.54 38/ 70
  Medium 0.32 905/2789 0.30 21/ 70
  High 0.40 1128/2789 0.16 11/ 70
pa_vig_freq 2857 0.58 1628/2789 0.21 14/ 68
pa_low_freq 2856 0.91 2527/2788 0.41 28/ 68
cusmoke_imp : Yes 2856 0.25 684/2788 0.22 15/ 68
a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD.   N is the number of non-missing values.
4.2.3.8.3 Reason for missing values in the outcome

Metadata indicate that some variables in the GS module provide information about the reason for missing data in maxgrip in Wave 1. However, the information is not complete (the variable is not recorded in Wave 7 and has a large proportion of missing values in Wave 3).

In the data from Denmark missing data in the outcome are due to being unable to take the measurement for 36% of the missing values, indicating that missing values might be related to bad physical conditions; 21% refuse to take the measurement, the reason for missingness is not known in 38% of the cases.

Table 4.18: Number and percentage of participants with all missing outcomes
Non missing outcome
Missing outcome
n % n %
R agrees to take measurement 15052 84 15 2
R refuses to take measurement 1 0 134 19
R is unable to take measurement 0 0 263 37
Proxy-interview 0 0 28 4
2876 16 263 37

For about 10% of missing outcome values the participants were unable to use one or both hands (gs002_ variable).

For most participants with missing outcome all the individual (4 measurements, 2 per hand) are missing.

4.2.4 Patterns (M5)

Here we show the co-occurrence of item missingness across variables (we set the minimum set size to be displayed to 5, smaller sets can).

4.2.4.1 Co-occurrence of item missingness at baseline

There is no common pattern of missingness between variables at baseline. Most missing values appear in only one variable. Grip strength (maxgrip) is the variable with most missing values, followed by weight.

4.2.4.2 Co-occurrence of outcome missingness across measurement occasions

There was no clear association between missingness in different measuring occasions - a relatively small proportion of subjects had co-occurrence of outcome missingness in more than one occasion.

4.2.4.3 Co-occurrence of item-missingness across measurement occasions for time-varying covariates

There is no clear pattern of co-occurrence of missing values of the time varying covariates across measurement occasions. Here we did not consider as missing the variables missing by design (weight in wave 3, PA in SHARELIFE interviews). The graphs are omitted from this report.

4.2.5 Comparison of non-enrolled and target population (ME1)

Here the aim is to understand if the non-enrolled (participants that fulfill the inclusion criteria that do not participate in the study) differ from responders and how they compare to the target population.

The characteristics of non-enrolled could be studied only indirectly, comparing the samples of responders with some known characteristics of the target population (sex, age and education composition, EUROSTAT data that available from year 2007, Wave 2 of the study), as the data on non-enrolled are not provided by the SHARE study (ME1 domain).

The age, sex and education distributions of the responders were compared to those from the target population (EUROSTAT data, available from 2007, accessed in August 2022) for each of the waves. For Wave 2 and 5 we also analyzed the random refreshment samples (excluding the oversampled younger cohort, the two subsamples can be identified using the study meta-data); the comparison with the characteristics of the target population is the most straightforward analysis for studying the characteristics of the reponders, while the analysis of the full samples of responders from Waves 2 to 7 to their target population provide a mean for assessing the characteristics of non-reponders and participants lost to follow-up. For Wave 3 and 7 the target population was considered the 52+ population.

The results of all these analyses indicated that the responders that participated to the survey at least once had substantially higher education compared to the population in the same age and sex groups, the males in the younger age groups were slightly underrepresented, as were the older women.

4.2.5.1 Non-response, using data from Wave 2 and Wave 5

Only Wave 2 and 5 provide full age samples that can be used to study the characteristics of non responders. For presentation purposes the age groups 85+ were grouped because of the small number of participants older than this age. Population data about education are mostly missing for individuals older than 85 in 2007, therefore the analyses about education are restricted to this age group.

4.2.5.1.1 Wave 2

Here we restrict the attention to the random refreshment sample from Wave 2 that responded to the interview and compare it to the target population in terms of age, sex and education.

The analysis included 1084 participants from the random sample Wave 2.

Sex

The distribution of sex in the sample and in the population is similar, a deviation can be observed in the younger age group, where males in the sample are underrepresented

Age

The older women are somehow underrepresented in the sample compared to the population

Table 4.19: Distribution of age, population 2007 vs random sample in Wave 2
Min. 1st Qu. Median Mean 3rd Qu. Max.
Population females 50 57 63 65.7 74 100
Sample W2 females 50 56 63 64.7 72 98
Population males 50 56 62 63.7 70 100
Sample W2 males 50 57 63 64.4 70 92

Education

The lower educated individuals are underrepresented in the sample. The differences between the sample and the population seem present in all age and sex groups.

4.2.5.1.2 Wave 5

Here we restrict the attention to the random refreshment sample from Wave 5 that responded to the interview and compare it to the target population in terms of age, sex and education.

The analysis included 1629 participants from the random sample Wave 5.

Sex

The distribution of sex differs between sample and population differs more than in wave 2, males in the younger age groups are more underrepresented.

Age

The older women are somehow underrepresented in the sample compared to the population, as are the younger men.

(#tab:ch621b )Distribution of age, population 2007 vs random sample in Wave 5
Min. 1st Qu. Median Mean 3rd Qu. Max.
Population females 50 57 65 66.0 73 100
Sample W2 females 50 58 64 65.4 72 100
Population males 50 56 63 64.3 71 100
Sample W2 males 50 58 66 66.1 72 98

Education

The lower educated individuals are underrepresented in the sample. The differences between the sample and the population seem present in all age and sex groups - only older women have a similar distribution of education in sample and population.

4.2.5.2 Non-response and loss to follow-up

Here we compare the observed samples (all but Wave 1) with the population values.

Wave 2, all participants

All respondents from wave 2 vs population

The analysis included 2487 participants from the random sample Wave 2.

Sex

The distribution of sex differs between sample and population, males in the younger age groups are more underrepresented.

Age

The older women are somehow underrepresented in the sample compared to the population, as are the younger men.

Table 4.20: Distribution of age, population 2007 vs random sample in Wave 2
Min. 1st Qu. Median Mean 3rd Qu. Max.
Population females 50 57 65 66.0 73 100
Sample W2 females 50 56 63 65.1 73 99
Population males 50 56 63 64.3 71 100
Sample W2 males 50 56 62 63.9 70 92

Education

The lower educated individuals are underrepresented in the sample. The differences between the sample and the population seem present in all age and sex groups - only older women have a similar distribution of education in sample and population.

Check here the output

Wave 3, all participants

All respondents from wave 3 vs population of 52+ (no refreshment samples in wave 3). (The labels of the younger age group indicate 50-55 but refer to 52-55).

The analysis included 1979 participants from the random sample Wave 5.

Sex

As in the other waves, males in the younger age groups are underrepresented.

Age

The older women are somehow underrepresented in the sample compared to the population, as are the younger men.

Table 4.21: Distribution of age, population 2007 vs random sample in Wave 2
Min. 1st Qu. Median Mean 3rd Qu. Max.
Population females 52 59 66 67.1 74 100
Sample W2 females 51 58 64 66.3 74 97
Population males 50 56 63 64.3 71 100
Sample W2 males 51 58 64 65.2 71 94

Education

The lower educated individuals are underrepresented in the sample. The differences between the sample and the population seem present in all age and sex groups - only older women have a similar distribution of education in sample and population.

Wave 4, all participants

All respondents from wave 4 vs population

The analysis included 2112 participants from the random sample Wave 4.

Sex

The distribution of sex differs between sample and population differs more than in Wave 4, males in the younger age groups are more underrepresented.

Age

The older women are somehow underrepresented in the sample compared to the population, as are the younger men.

Table 4.22: Distribution of age, population 2011 vs random sample in Wave 4
Min. 1st Qu. Median Mean 3rd Qu. Max.
Population females 50 57 65 66.0 73 100
Sample W2 females 50 57 64 65.6 73 99
Population males 50 56 63 64.3 71 100
Sample W2 males 50 57 63 64.5 71 96

Education

The lower educated individuals are underrepresented in the sample. The differences between the sample and the population seem present in all age and sex groups - only older women have a similar distribution of education in sample and population.

Wave 5, all participants

All respondents from Wave5 vs population

The analysis included 3919 participants from the random sample Wave 5.

Sex

The distribution of sex differs between sample and population differs more than in Wave 5, males in the younger age groups are more underrepresented.

Age

The older women are somehow underrepresented in the sample compared to the population, as are the younger men.

Table 4.23: Distribution of age, population 2013 vs random sample in Wave 5
Min. 1st Qu. Median Mean 3rd Qu. Max.
Population females 50 57 65 66.0 73 100
Sample W2 females 50 57 64 65.5 72 100
Population males 50 56 63 64.3 71 100
Sample W2 males 50 57 64 65.3 72 98

Education

The lower educated individuals are underrepresented in the sample. The differences between the sample and the population seem present in all age and sex groups - only older women have a similar distribution of education in sample and population.

Wave 6, all participants

All respondents from Wave 6 vs population

The analysis included 3514 participants from the random sample Wave 6.

Sex

The distribution of sex differs between sample and population differs more than in Wave 6, males in the younger age groups are more underrepresented.

Age

The older women are somehow underrepresented in the sample compared to the population, as are the younger men.

Table 4.24: Distribution of age, population 2015 vs random sample in Wave 6
Min. 1st Qu. Median Mean 3rd Qu. Max.
Population females 50 57 65 66.0 73 100
Sample W2 females 50 58 65 65.9 72 98
Population males 50 56 64 64.5 71 100
Sample W2 males 50 58 65 65.6 72 100

Education

The lower educated individuals are underrepresented in the sample. The differences between the sample and the population seem present in all age and sex groups - only older women have a similar distribution of education in sample and population.

Wave 7, all participants

All respondents from Wave 7 vs population of 52+ (no refreshment samples in wave 7). (The labels of the younger age group indicate 50-55 but refer to 52-55).

The analysis included 3025 participants from the random sample Wave 5.

Sex

As in the other waves, males in the younger age groups are underrepresented.

Age

The older women are somehow underrepresented in the sample compared to the population, as are the younger men.

Table 4.25: Distribution of age, population 2007 vs random sample in Wave 2
Min. 1st Qu. Median Mean 3rd Qu. Max.
Population females 52 59 66 67.1 74 100
Sample W2 females 52 60 66 67.5 74 101
Population males 50 56 63 64.3 71 100
Sample W2 males 52 60 66 66.9 73 98

Education

The lower educated individuals are underrepresented in the sample. The differences between the sample and the population seem present in all age and sex groups - only older women have a similar distribution of education in sample and population.

4.2.6 Probability of loss to follow-up and death (ME2)

For the analysis purposes, the participants of some of the groups would be classified as lost to follow-up (out of sample, definitive missingness, out of household if not re-included in the analysis later). Using this definition we estimate the probability of loss o follow-up, death and death after loss to follow-up. We estimated the cumulative incidence functions using Aalen-Johansen estimators for loss to follow-up and deaths (defining death times/events only for those that are not lost to follow-up - as if LOF was an absorbing state), and used Kaplan Meier estimator to estimate the probability of death after loss to follow-up (time of entry=the time of LTF, time of end=death, time of censoring = the end of the study for those who are not dead). The estimates were stratified by sex only, or by sex and age group.

Overall, the estimated probability of loss to follow-up increases most notably at the second interview (about 20% 2 years after the first interview), and it increased up to 40% by the end of the study. The estimated probability of death by the end of the study was about 20% prior to drop-out, and about 35% after post drop-out, somehow larger for males.

The probability of loss to follow-up was virtually the same across age and sex. In contrast, the probability of death prior and post dropout substantially increased with age as expected, and tended to be higher for men at younger ages.

4.2.7 Dropout effect on outcome (ME3)

4.2.7.1 Mean profiles of outcome by time of death

The graphs below show the average grip strength for groups of participants stratified by the measurement occasion of death, participants with complete data (7 observations, category named still in the cohort) are also displayed for comparison. The analyses were stratified by sex and age group. The 70-80 and the 80+ age groups were merged due to the small number of participants that entered the study at an age older than 80.

Participants that die during the study have, from inclusion, lower values of grip strength compared to others, especially among men.

4.2.7.2 Mean profiles of outcome by time to loss to follow-up

A similar analysis was also performed stratifying the participants by the measurement occasion of last available interview, if later interviews were missing (even though it is possible that participants will participate again future waves, as they have not all been excluded from the study and intermittent missingness is possible). Participants that died during the study are excluded from the graph. Participants in the category Complete include those with complete information (7 available measurements).

The difference in mean outcome between complete and incomplete cases due to definitive missingness is smaller compared to what was observed for death and specific trends are not observed.

4.3 Univariate descriptions

4.3.1 Description of variables at baseline (U1)

Here we describe the distribution of the outcome and of the explanatory variables at baseline.

The overall summary of all the variables from AS at baseline (categorical and numerical) is given in the table below. We report the distribution of the physical activity variables using four and two levels only in this summary. Later only the binary variables that will be used in modelling are summarized.

Overall characteristics at baseline.
N

N=5452
gender : Female 5452 0.53 2869/5452
age_int 5452 53.0 60.0 70.0
62.5 ± 10.3
age_int_cat : 50-59 5452 0.47 2576/5452
  60-69 0.28 1502/5452
  70-80 0.19 1012/5452
  80+ 0.07 362/5452
weight 5361 65.0 75.0 85.0
76.4 ± 15.2
height_imp 5418 165.00 171.00 178.00
171.42 ±  9.13
education_imp : Low 5428 0.22 1191/5428
  Medium 0.39 2130/5428
  High 0.39 2107/5428
pa_vig : More than once a week 5423 0.46 2519/5423
  Once a week 0.14 755/5423
  One to three times a month 0.07 368/5423
  Hardly ever, or never 0.33 1781/5423
pa_vig_freq 5423 0.6 3274/5423
pa_low : More than once a week 5422 0.81 4400/5422
  Once a week 0.09 486/5422
  One to three times a month 0.03 172/5422
  Hardly ever, or never 0.07 364/5422
pa_low_freq 5422 0.9 4886/5422
cusmoke_imp : Yes 5423 0.26 1395/5423
maxgrip 5272 28.0 35.0 47.0
37.1 ± 12.9
a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD.   N is the number of non-missing values.

At baseline interview most participants were in the younger age groups, the vast majority reported low-intensity physical activity at least once a week, 63% vigorous physical activity at least once a week. About a quarter were smokers, the most common education level was high and there were slighly more women than men.

The distribution of the numerical variables is reported also graphically (the graphical display of categorical variables is omitted).

4.3.1.1 Graphical display for numerical variables at baseline

4.3.1.1.1 Age

The distribution of age at baseline was positively asymmetric.

4.3.1.1.2 Weight

The variable was reported with digit preference (values ending with 0 and 5 were more frequent than expected)

4.3.1.1.3 Height

The variable was reported with digit preference (values ending with 0 and 5 were more frequent than expected)

4.3.1.1.4 Grip strength

The variable was reported with digit preference (values ending with 0 and 5 were more frequent than expected); the distribution is bimodal, reflecting the large difference in the location of the distribution for men and women.

The characteristics observed at baseline were observed also at following measurement occasions.

4.3.1.1.5 Further exploration of digit preference for grip strength

We present the grip strength data with barplots, where the bars of the values with numbers ending with 0 or 5 are plotted in red. Here we display all the available measurements (data by wave were displayed previously).

All the peaks that deviate from the expected shape of the distribution are associated to values that end with 0 or 5.

4.3.2 Description of the time varying variables at later times (U2)

Here we summarize the longitudinal data of outcome and time-varying independent variables, stratifying the summary statistics by wave. Note that as wave is the time metric of the data collection process, the summaries stratified by wave can be used for the identification of data collection problems. The longitudinal trends of the time varying variables are summarized later (L2 for the outcome and L4 for the time-varying variables).

The digit preference was observed in all waves for weight and the outcome; the proportions did not vary greatly for categorical variables. The changes for age were described in previous sections.

Overall baseline characteristics across waves.
N
Wave 1
N=1596
Wave 2
N=2487
Wave 3
N=1979
Wave 4
N=2112
Wave 5
N=3919
Wave 6
N=3514
Wave 7
N=3025
gender : Female 18632 0.53 850/1596 0.53 1330/2487 0.54 1069/1979 0.53 1122/2112 0.53 2071/3919 0.53 1858/3514 0.53 1604/3025
age_int 18632 56.00 62.00 72.00
64.40 ± 10.58
56.00 63.00 72.00
64.53 ± 10.30
58.00 64.00 73.00
65.79 ±  9.93
57.00 64.00 72.00
65.11 ± 10.53
57.00 64.00 72.00
65.39 ± 10.08
58.00 65.00 72.00
65.77 ± 10.03
60.00 66.00 73.00
67.23 ±  9.52
age_int_cat : 50-59 18632 0.40 644/1596 0.38 946/2487 0.32 641/1979 0.36 754/2112 0.34 1317/3919 0.32 1121/3514 0.25 748/3025
  60-69 0.29 455/1596 0.31 783/2487 0.35 687/1979 0.33 707/2112 0.35 1376/3919 0.35 1231/3514 0.37 1122/3025
  70-80 0.22 353/1596 0.22 538/2487 0.23 448/1979 0.21 438/2112 0.22 859/3919 0.23 820/3514 0.28 845/3025
  80+ 0.09 144/1596 0.09 220/2487 0.10 203/1979 0.10 213/2112 0.09 367/3919 0.10 342/3514 0.10 310/3025
weight 16356 65.0 74.0 84.0
74.7 ± 14.6
65.0 75.0 85.0
75.5 ± 14.7
65.0 75.0 85.0
76.4 ± 15.3
65.0 75.0 85.0
76.6 ± 15.4
65.0 76.0 86.5
77.2 ± 15.8
66.0 76.0 87.0
77.8 ± 15.9
pa_vig_freq 14709 0.60 956/1591 0.50 1226/2430 0.53 1097/2077 0.61 2405/3913 0.62 2178/3509 0.59 704/1189
pa_low_freq 14709 0.88 1400/1592 0.89 2165/2430 0.89 1843/2077 0.90 3510/3911 0.91 3182/3509 0.88 1042/1190
maxgrip 17929 26.0 34.0 46.0
36.1 ± 13.2
25.0 32.0 44.0
34.6 ± 12.6
27.0 34.0 46.0
36.2 ± 12.8
27.0 35.0 47.0
37.2 ± 12.9
27.0 35.0 47.0
36.9 ± 12.4
28.0 35.0 47.0
37.0 ± 12.4
27.0 35.0 46.0
36.7 ± 12.1
a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD.   N is the number of non-missing values.

The distribution of the numerical variables through Waves are also presented graphically.

4.4 Multivariate description of data

4.4.1 Associations at baseline with structural variables (V1)

Here we explore the associations between explanatory variables measured at baseline and age and sex. We present the association of age and the categorical explanatory variables plotting the smoothed relationship between age and the value of the variable, stratifying by sex (categorical variables have two categories and are internally coded as 0/1, the smoothed relationship is obtained using the geom_smooth() function, method: gam). The association between education and age and sex was extensively explored in ME1 and therefore it is not presented here.

Vigorous physical activity in men decreases more sharply after 60, the descrease in vigorous PA seems more linear for women. Moderate physical activity remaines stable up to approximately 70 years and decreases sharply afterwords, men and women have similar association between activity and age. There is a low proportion of smokers at older ages, and women have smaller probability of smoking. Weight and height at baseline are negatively associated to age. Beside ageing, this might be due to the cohort effect, which is further explored in LE1.

4.4.2 Independent variables - Correlation (V2)

We explore the correlation between explanatory variables at baseline.

4.4.2.1 Overall correlation at baseline

Females on average have lower values of all the variables, as do older participants. The two types of physical activities are positively correlated, as are height and weight (to a larger extent). Age is negatively associated to all the explanatory variables.

4.4.2.1.1 At baseline, stratified by sex

The association between education and age was larger for women, as we are considering age and education at baseline, it is most likely due to a cohort effect (further explored in LE1).

Males

4.4.2.2 Additional explorations

Here we use all the observed data (with repeated measurements) to explore the association between some of the variables, namely height and weight.

Data cleaning performed before data screening removed very low values of height (even if they were considered plausible by the data cleaning performed within SHARE - see details in the import file), but some low values of height might still be due to errors.

The association between the two variables is as expected.

Some very low values of height do not seem consistent with the weight values. For these data points also the BMI is in some cases large.

4.4.3 Interactions between explanatory variables (V3)

The AS envisions the use of interactions between age and all time fixed explanatory variables (sex, education, height), the main interest will be in the interpretation of the interaction between sex and functions of age. The descriptive statistics of all the explanatory variables stratified by age groups and sex are reported in the section V1.

4.4.3.1 Association between age and weight, stratified by physical activity

The possible interaction between age and status of vigorous/low intensity activity with respect to weight is explored here, as it might be of interest to domain experts. The decline of weight with age might be slightly less pronouced among those that perform vigorous physical activity.

Scatter plot by vigorous physical activity and sex

Scatter plot by low physical activity and gender

4.4.4 Stratification (VE1)

We stratify the univariate descriptions of the data by sex and age group first; we explore also the stratification by baseline wave.

4.4.4.1 Baseline measurements stratified by sex

We limit the exploration to baseline measurements (as the wave by wave exploration is conducted on complete data in VE1). The results reported below with tables and graphs indicate the following.

Females and males differed substantially in the distribution of height, weight, vigorous (but not low-intensity) physical activity, and education. Age was similar.

The distribution of grip strength was no longer asymmetric and bimodal, when data were stratified by sex, and it seems appropriate to assume a gaussian distribution; the digit preference was visible despite the automatized method of measurement.

Baseline characteristics by sex.
N
Male
N=2583
Female
N=2869
Wave : Wave 1 5452 0.29 746/2583 0.30 850/2869
  Wave 2 0.23 603/2583 0.24 699/2869
  Wave 4 0.08 215/2583 0.07 199/2869
  Wave 5 0.34 885/2583 0.35 998/2869
  Wave 6 0.05 134/2583 0.04 123/2869
age_int 5452 53.00 60.00 69.00
62.24 ±  9.92
53.00 60.00 70.00
62.77 ± 10.66
age_int_cat : 50-59 5452 0.48 1230/2583 0.47 1346/2869
  60-69 0.28 734/2583 0.27 768/2869
  70-80 0.18 472/2583 0.19 540/2869
  80+ 0.06 147/2583 0.07 215/2869
weight 5361 75.0 82.0 92.0
84.0 ± 13.6
60.0 68.0 77.0
69.4 ± 13.2
height_imp 5418 173.00 178.00 183.00
178.09 ±  7.07
161.00 165.00 170.00
165.41 ±  6.11
education_imp : Low 5428 0.15 397/2569 0.28 794/2859
  Medium 0.47 1204/2569 0.32 926/2859
  High 0.38 968/2569 0.40 1139/2859
pa_vig_freq 5423 0.64 1632/2566 0.57 1642/2857
pa_low_freq 5422 0.91 2331/2566 0.89 2555/2856
cusmoke_imp : Yes 5423 0.27 696/2567 0.24 699/2856
maxgrip 5272 40.00 48.00 55.00
47.09 ± 10.28
24.00 28.00 33.00
28.02 ±  7.01
a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD.   N is the number of non-missing values.
4.4.4.1.1 Graphical presentation of stratified distributions (by sex)
4.4.4.1.1.1 Grip strength

Figure 4.1: Distribution maxgrip by sex

4.4.4.1.1.2 Age

Figure 4.2: Distribution maxgrip by gender

4.4.4.1.1.3 Weight

Figure 4.3: Distribution weight by gender

4.4.4.1.1.4 Height

Figure 4.4: Distribution height by gender

4.4.4.2 Stratification based on grouped age at baseline and sex

As age is also the time metric from the AS, in this section we explore only some aspects (related to baseline measurement) of the association between age and the other variables, stratifying by sex. More detailed explorations are presented in the sections devoted to time trends. In most analyses participants are grouped in 10 year age groups.

The aim of this analysis is to identify independent variables that might be associated with age and sex.

Among women the association between age and education is stronger (older participants with lower education).

4.4.4.2.1 Females
Baseline characteristics by age category for females.
N
50-59
N=1346
60-69
N=768
70-80
N=540
80+
N=215
education_imp : Low 2859 0.15 197/1341 0.27 208/ 766 0.47 253/ 539 0.64 136/ 213
  Medium 0.31 422/1341 0.37 283/ 766 0.31 166/ 539 0.26 55/ 213
  High 0.54 722/1341 0.36 275/ 766 0.22 120/ 539 0.10 22/ 213
pa_vig_freq 2857 0.68 908/1344 0.59 454/ 765 0.44 236/ 538 0.21 44/ 210
pa_low_freq 2856 0.93 1253/1344 0.93 710/ 765 0.85 455/ 537 0.65 137/ 210
cusmoke_imp : Yes 2856 0.28 380/1344 0.23 177/ 765 0.21 112/ 538 0.14 30/ 209
weight 2796 62.0 69.0 79.0
71.1 ± 13.3
60.0 68.0 76.0
69.4 ± 12.5
59.0 65.5 74.0
67.5 ± 13.4
55.0 62.0 70.0
63.2 ± 11.3
height_imp 2848 163.00 167.00 171.00
166.94 ±  5.97
161.00 165.00 169.00
164.93 ±  5.83
160.00 164.00 168.00
163.48 ±  5.67
158.00 162.00 167.00
162.13 ±  6.14
maxgrip 2760 28.00 31.00 35.00
31.27 ±  6.11
24.00 28.00 31.00
27.50 ±  5.85
20.00 24.00 27.00
23.81 ±  5.79
15.00 19.00 22.00
18.85 ±  5.29
a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD.   N is the number of non-missing values.
4.4.4.2.2 Males
Baseline characteristics by age category for males.
N
50-59
N=1230
60-69
N=734
70-80
N=472
80+
N=147
education_imp : Low 2569 0.13 153/1222 0.14 103/ 732 0.21 101/ 470 0.28 40/ 145
  Medium 0.47 571/1222 0.48 352/ 732 0.46 217/ 470 0.44 64/ 145
  High 0.41 498/1222 0.38 277/ 732 0.32 152/ 470 0.28 41/ 145
pa_vig_freq 2566 0.72 879/1221 0.66 480/ 732 0.49 229/ 469 0.31 44/ 144
pa_low_freq 2566 0.94 1151/1222 0.93 680/ 732 0.87 405/ 468 0.66 95/ 144
cusmoke_imp : Yes 2567 0.31 375/1222 0.27 195/ 732 0.21 97/ 468 0.20 29/ 145
weight 2565 76.0 85.0 95.0
86.3 ± 13.8
75.0 83.0 90.0
84.2 ± 13.2
71.5 80.0 88.0
80.0 ± 12.3
70.0 75.0 80.0
75.9 ± 11.0
height_imp 2570 175.00 180.00 184.00
179.70 ±  6.87
173.00 178.00 183.00
178.22 ±  6.89
171.00 175.00 179.00
175.16 ±  6.52
169.75 173.00 178.00
173.26 ±  6.12
maxgrip 2512 47.00 53.00 58.00
51.95 ±  8.77
41.00 47.00 52.00
46.59 ±  8.20
34.00 40.00 45.00
39.47 ±  7.94
26.00 32.00 37.00
31.60 ±  8.42
a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD.   N is the number of non-missing values.
4.4.4.2.3 Graphical presentation of stratified distributions of numerical variables
4.4.4.2.3.1 Weight

Figure 4.5: Distribution weight by age (categorical)

4.4.4.2.3.2 Height

Figure 4.6: Distribution height by age (categorical)

4.4.4.3 Stratification by wave of the baseline measurements

The overall summary of baseline measurements over waves is give in the table below (participants can be included in the study at different waves).

Overall baseline characteristics across waves.
N
Wave 1
N=1596
Wave 2
N=1302
Wave 4
N=414
Wave 5
N=1883
Wave 6
N=257
gender : Female 5452 0.53 850/1596 0.54 699/1302 0.48 199/ 414 0.53 998/1883 0.48 123/ 257
age_int 5452 56.00 62.00 72.00
64.40 ± 10.58
54.00 61.00 70.00
62.77 ± 10.01
51.00 52.00 54.00
53.24 ±  4.04
56.00 63.00 71.00
63.98 ± 10.02
51.00 52.00 52.00
53.78 ±  6.43
age_int_cat : 50-59 5452 0.40 644/1596 0.44 576/1302 0.94 390/ 414 0.39 737/1883 0.89 229/ 257
  60-69 0.29 455/1596 0.30 393/1302 0.05 19/ 414 0.33 617/1883 0.07 18/ 257
  70-80 0.22 353/1596 0.19 250/1302 0.01 4/ 414 0.21 398/1883 0.03 7/ 257
  80+ 0.09 144/1596 0.06 83/1302 0.00 1/ 414 0.07 131/1883 0.01 3/ 257
weight 5361 65.0 74.0 84.0
74.7 ± 14.6
65.0 75.0 85.0
75.5 ± 14.3
68.0 78.0 90.0
80.1 ± 16.9
65.0 75.0 86.0
76.9 ± 15.6
70.0 80.0 90.0
80.8 ± 15.4
height_imp 5418 164.00 170.00 177.00
170.46 ±  8.96
165.00 170.00 177.00
171.05 ±  9.01
167.00 174.00 180.00
173.94 ±  8.86
165.00 171.00 178.00
171.52 ±  9.20
168.00 174.00 181.00
174.46 ±  9.37
education_imp : Low 5428 0.26 406/1586 0.22 284/1293 0.10 42/ 410 0.22 422/1882 0.14 37/ 257
  Medium 0.44 696/1586 0.38 494/1293 0.35 143/ 410 0.37 698/1882 0.39 99/ 257
  High 0.31 484/1586 0.40 515/1293 0.55 225/ 410 0.40 762/1882 0.47 121/ 257
pa_vig_freq 5423 0.60 956/1591 0.50 636/1284 0.67 278/ 413 0.65 1215/1879 0.74 189/ 256
pa_low_freq 5422 0.88 1400/1592 0.89 1143/1284 0.94 388/ 413 0.92 1719/1877 0.92 236/ 256
cusmoke_imp : Yes 5423 0.31 499/1590 0.28 356/1284 0.26 106/ 413 0.20 380/1879 0.21 54/ 257
maxgrip 5272 26.0 34.0 46.0
36.1 ± 13.2
25.0 33.0 44.0
35.0 ± 12.6
33.0 43.0 55.0
43.7 ± 13.0
28.0 35.0 47.0
37.1 ± 12.2
33.0 40.0 54.8
43.0 ± 12.5
a b c represent the lower quartile a, the median b, and the upper quartile c for continuous variables. x ± s represents X ± 1 SD.   N is the number of non-missing values.

Due to the study design, that envisioned the inclusion of refreshment samples limited to the younger age groups, the participants first included in Wave 4 or 6 differed from those included in the other waves: the were substantially younger and were more frequently males - the differences in age and gender should explain the difference in the other variables: higher education, higher values for weight, height, grip strength.

The numerical variables are summarized also graphically.

4.4.4.4 Age across baseline waves

4.4.4.5 Weight across baseline waves

4.4.4.6 Height across baseline waves

4.4.4.7 Grip strength across baseline waves

4.5 Longitudinal aspects

4.5.1 Outcome variable - Profiles (L1)

Here the aim is to visualize the individual profiles of the outcome for the participants.

The number of subjects is very large and profile plots of grip strength are not clearly conveying the information about individual variability. To visualize effectively the profile plots we use different strategies: we use selected subgroups of participants (100 per group, stratifying the plots by sex and age groups), and different time metrics (age or measurement occasion). Interactive plots are also available (see the separate output page devoted to interactive plots).

4.5.1.1 Age as time metric

Overall, the profile plots highlight the trend towards diminishing grip strength with age and the rate of change seems to accelerate over age (the slope at later ages is bigger than at the beginning). Older participants are followed up for shorter times, substantial increases or decreases in grip strength bewtween measurements are possible. The variability of the outcome tends to decrease at later measurement occasions, especially in the older age groups.

All profiles

Subsets of profiles

Here we display the profiles of approximately 400 individuals for each sex group.

4.5.1.2 Measurement occasion as time metric

Here we show the profile plots by measurement occasion. In our case study the plots based on measurement occasion and stratified by age group are more informative than those based on age, as participants enter the study at different ages. Even though age was included as a continuous time metric in the analysis strategy, a summary stratified by ten-year groups can serve as a quick overview of the longitudinal trends by age. The plots based on subsets are more easily interpretable also with this time metric.

4.5.1.2.1 All profile plots

4.5.1.3 Subsets of profile plots

Profile plots of grip strength, choosing 100 subjects for each age/sex category with the baseline value of grip strength at a certain quantile of the distribution (100 quantiles 0.00001 to 1, by 0.01). The plot includes only subjects with at least three valid measurements (can be changed). This type of plot substitutes the classical profile plot in this application,

We also show the profiles of the participants with complete follow-up (7 measurements)

4.5.3 Outcome variable - Correlation and variability (L3)

Here we used complete pairs of observations and use Pearson correlation to quantify the correlation between measurements taken in different waves/at different measurement occasions.

The following explorations evaluate the correlations using waves, measurement occasions, time since baseline, and age as time metrics. These explorations can be useful for determining the characteristics of the outcome based on different time metrics. Using waves we can identify some systematic errors due to wave, while measurement occasion/age is more directly related to the research question (decline of grip strength in time/with age).

The variability of the outcome at different ages is explored only for age as a time metric.

4.5.3.1 Wave as time metric

The correlations between subsequent measurements is very large (about 0.90) and decreases slightly for larger time differerences.

The large correlations are driven by the separation of the values of males and females. Below are shown the correlation matrices for males and females, separately, and the scatterplots of the measurements.

The correlations are slightly lower for females compared to males. It is interesting to note that the decrease in measurements taken further apart decreases more substantially, if sexes are analyzed separately.

Matrix with correlations (above the diagonal), SD (on the diagonal) and covariances (under the diagonal), males

Table 4.26: Correlation/SD/covariances of grip strength across waves, males
Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Wave 6 Wave 7
Wave 1 10.4 0.82 0.81 0.79 0.74 0.74 0.71
Wave 2 80.8 10.16 0.84 0.84 0.79 0.77 0.75
Wave 3 78.7 79.39 9.97 0.88 0.83 0.80 0.79
Wave 4 73.3 77.13 85.30 10.29 0.86 0.84 0.84
Wave 5 69.7 67.94 75.49 83.67 9.91 0.87 0.85
Wave 6 64.2 64.78 68.31 77.01 77.18 9.69 0.87
Wave 7 57.7 57.11 62.50 70.22 69.34 72.31 9.29

Matrix with correlations (above the diagonal), SD (on the diagonal) and covariances (under the diagonal), females

Table 4.27: Correlation/SD/covariances of grip strength across waves, females
Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Wave 6 Wave 7
Wave 1 7.26 0.76 0.73 0.73 0.73 0.66 0.59
Wave 2 35.01 6.81 0.80 0.80 0.79 0.75 0.72
Wave 3 33.12 35.65 6.88 0.81 0.78 0.75 0.72
Wave 4 31.54 33.31 34.68 6.76 0.85 0.82 0.78
Wave 5 29.07 31.06 31.37 34.68 6.52 0.82 0.81
Wave 6 25.77 30.18 30.80 33.75 33.45 6.64 0.82
Wave 7 22.19 26.97 27.12 29.78 30.37 32.40 6.29

Generalized pairs plot

4.5.3.2 Measurement occassion as time metric

Correlation matrix

Matrix with correlations (above the diagonal), SD (on the diagonal) and covariances (under the diagonal)

Table 4.28: Correlation/SD/covariances of grip strength across measurement occasions, all
M1 M2 M3 M4 M5 M6 M7
M1 12.9 0.92 0.92 0.91 0.89 0.88 0.87
M2 148.2 12.77 0.93 0.92 0.91 0.90 0.90
M3 142.9 143.15 12.40 0.94 0.93 0.92 0.91
M4 143.9 146.48 146.41 12.38 0.94 0.93 0.93
M5 135.3 138.36 137.43 137.07 12.15 0.94 0.94
M6 130.3 133.87 131.54 131.04 132.94 11.72 0.94
M7 135.2 137.11 138.77 135.46 137.06 134.60 11.96

Separate for males and females

The variance decreased with measurement occasion, the correlations decreased with larger time lags.

Males

Table 4.29: Correlation/SD/covariances of grip strength across measurement occasions, males
M1 M2 M3 M4 M5 M6 M7
M1 10.3 0.83 0.82 0.79 0.74 0.73 0.71
M2 78.8 9.85 0.86 0.83 0.80 0.78 0.76
M3 75.9 78.06 9.78 0.87 0.84 0.81 0.82
M4 71.1 75.19 82.14 9.66 0.87 0.83 0.85
M5 66.0 70.77 77.77 78.94 9.81 0.86 0.87
M6 59.6 64.57 66.46 65.86 70.82 9.28 0.89
M7 57.7 59.17 64.78 67.01 71.27 72.79 9.05
Table 4.29: Correlation/SD/covariances of grip strength across measurement occasions, females
M1 M2 M3 M4 M5 M6 M7
M1 7.01 0.78 0.78 0.77 0.74 0.71 0.59
M2 36.06 6.92 0.81 0.79 0.75 0.74 0.65
M3 33.93 34.21 6.52 0.84 0.80 0.78 0.69
M4 34.18 33.73 34.99 6.45 0.84 0.80 0.70
M5 30.68 30.67 31.56 31.29 6.43 0.85 0.78
M6 27.92 28.56 28.90 28.14 32.61 6.18 0.78
M7 22.19 21.45 23.11 21.59 24.64 26.18 5.92

4.5.3.3 Age as time metric (two-year groups, from 50 years old)

Data were grouped in two year categories to obtain bigger groups. Two years were used as the difference between waves is usually two years and consecutive measurements at individual level are usually taken each two years. Only estimates based on at least 20 observations are shown.

4.5.3.4 Correlations

Note that here we use age_int, which is an interger value

The correlations between grip strength measured at consecutive ages is very large but it decreases with age for far apart measurements. Measurements for younger participants are more correlated than for older participants,

The large correlations also here are driven by the separation of the values of males and females,

As an example, see below the scatterplots of the values of grip strength for the individuals aged between 50 and 59 (grouped in two-year categories), by sex. The within-sex correlations are much weaker than the overall correlations.

Below are the complete correlations matrices separately for the two sexes and the boxplots of the correlations. Note that the estimates appear less stable, as they are based on smaller groups. Only estimates based on more than 20 observations are displayed.

Also in the separate analyses the correlations appear to diminish as the age difference increases

The correlations are displayed also with an alternative graphical display, that makes easier the numerical comparisons.

The estimated correlations for large lags appear veary variable, especially for males and for the oldest participants. The correlations decrease at larger lags.

4.5.3.5 Variability

The graph previously displayed in the longitudinal trends (L2) domain can be used also to assess how the variability of the measurements varies with age - for example, to identify possible problems with the hypothesis of constant variance.

The graph below show in a single graph the average, standard deviation and coefficient of variation of the outcome, grouping the participants in two-year groups. The SD decreases with age, as does the mean, while the CV increaes.

Note that these graphs are produced using all the longitudinal data, the findings based on the baseline data are similar (trends in SD are less visible at older ages - due to smaller sample sizes?)

4.5.5 Evaluation of possible age-cohort effects (LE1)

The following graphs show the smoothed association between age and grip strength, evaluated using baseline measurements (blue), all longitudinal data (f), age-cohort trajectories (red lines, grouping participants in 5 year groups based on their age at baseline). The graphs are shown separately for men and women.

4.5.5.1 Overall description

Here we define the birth cohort variable that will be used to explore the possible presence of cohort effects in some of the characteristics of the participants. Participants are grouped in 10 year groups, except for the older cohort (including 19 years because of small sample size). There is a strong association between age and birth cohort due to the design of the study. The association is present analyzing all data (first graph) or just the first interview (second graph)

4.5.5.2 Association of birth cohort with outcome

The following graphs show the smoothed association between age and grip strength, evaluated using baseline measurements (black solid line), all longitudinal data (black dashed line), year-of-birth-cohort trajectories (colored lines described in the legend, grouping participants in 5 year groups based on their year of birth, larger grouping is used for extreme years where less participants were included). The graphs are shown separately for men and women.

There is a clear birth cohort effect

In a similar way we also explored the longitudinal age effect grouping the participants that belonged to the same age group, defined in 5-year groups.

4.5.5.3 Association of birth cohort and physical activity

When the summaries of physical activity is stratified by birth cohort we observe that there is not much decline with age for the younger cohorts, while the decline is very steep for the oldest cohort (that is including all the oldest participants). The cohorts differ in their engagement in vigorous PA. Among women the effect is different.

When the summary is stratified by birth cohort we observe that there is not much decline with age for the younger participants (belonging to younger birth cohorts), while the decline is very steep for the oldest cohort (that is including all the oldest participants). The cohorts differ in their engagement in vigorous PA. The smoothed estimates by cohort much more variable for women.

Explanation for the effect for women???

4.6 Output for data analysis

Outputted datasets

Data in long format: share1_withflags - ** add later **

The matrices with the information about the missing value structure are in wide format, each column indicates a Wave.

df.missing_cv (by Wave, wide format) Codes: -999: not yet included in the study; 1: interview available; -10 missing interview; -1000: out of sample/missing by design.

Additionally, the df.missing_5cat_cv data set distinguishes lost to follow up and intermittent missingness (1: interview done, -10 lost to FU, -11: intermittent missingness, -12: lost to follow up, -100: death, -1000 out of sample missing by design, -999: not yet included in the study) and df.missing_6cat_cv data set distinguishes lost to follow up and intermittent missingness (1: interview done, -10 lost to FU, -11: intermittent missingness, -12: lost to follow up, -100: death, -1000 out of sample, -999: not yet included in the study, -1001: missing by design) - separates out of sample and missing by design.

death.status.waves is the matrix with the Death/Alive/Unknown indication (Wave, wide format ). NAs in the measurement occasion matrix indicate that the measurement is not obtained as the study ended.

Complete coverscreen information (wide format) is also available and could be exported: cv.all